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The forkhead domain is a monomeric DNA bind ing 
motif that defines a rapidly growing family of 
Saryotic transcriptional regulators. Genetic and 
SSSod data suggest a central role in embryonic 
a^ttforgenefencodingfor^ 
have used lfCR and tow stringency hysndaafaoj , to 
isolate clones from human cDNA and genomic libraries 
Sal Trtpresent seven novel forkhead genes, freaeJ to 
telSFm spatial patterns of expression for theseven 
genes range from specific for a single tjssuj to 
nwrly ubiquitous. The DNA binding j^pecnlcihes of 
four of the FREAC proteins were determined by 
slctit of binding sites from random • 
nucleotides. The binding sites ^.f*™*^ 
proteins share a core sequence, RTA A AYA, but ^fter 
in the positions flanking the core. Domain jwaps 
tatWeen two FREAC proteins identified two subregions 

differences in DNA binding specific.^ Applying a 
circular permutation assay, we show that binding of 
FREAC proteins to their cognate results .n 
' . bending of the DNA at an angle of 80-90°. , 

toy *ards:~ DNA bending/DNA b.nd.ng/forkhead/ 
transcription factor 



Introduction 

The key event in gene regulation, control of initiation 
of trariription. depends onthe coordinated ac^of 
sequence-specific DNA binding prote.ns. Comb ma tonal 
effects generate independent regulation and «ll ^type- 
specific expression for genes far more numerous than the-, 
SptiL. regulators themselves. Several rnechantsm, 
contribute to the complexity of transcnpt.onal regu atton 
by ^.acting factors. The formation of 
between two DNA binding proteins can alter their ability 
' toactivate transcription, their affinity for PNApr sequence 
specificity and the stability of the dimer itself (Umjand 
Mcknight, 1991 ). Overlapping, yet distinct. b,ndm| sue 
preferences among related transcnpt.or . factors allow Jwo 
promoters to utilize the same set of factors but with 
different relative affinities. Synergy or antagonism betwee.. 
transcription factors can act at the level of DNA b.nd.ng 
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(Grueneberg etal., 1992) or transcriptional activation |Ljn 
etal 1990; Herschlag and Johnson, 1993), and can make 
the Activity of a certain DNA binding protein highly 
cohte^ (Carlsson et al, 1993; G.ese and 
GroLheX 1993). Alterations . in DNA topology also 
generate \cbntcxt dependence; some: regulatory proteins 
introduce sharp bends in the DNA, the effect of w^Ch 
depends on the position and ortenuitiop of the binAng. 
site (Natesan and Gilman. 1993; GrosschedWr*/.. 1994). 
' The rnodular structure of eukarypUc,ttartscnptipn,lac-. , 
toR'"wW:diiuinct functions such, as DNA binding and 
transcriptional activation are often contained within non- 
overlapping protein domains, suggests that any class of 
DNA binding motif : would , be capable of mediating any 
kind of biok>gical signalling^Nevertheless,,regulaAori, .... 
■ Sins withMsame basic design in t^ir DNA b.ndr|g , 
domains also tend,tp.be,related . - , 

steroid ^ '• 
This' may reflect that evolution- prefers, to move n small , 
sSS in creating ,a new entity will first ookto£e - 
closest relative, but also that the structure pf the^DN. A . 
binding' domain is not .irrelevant for overall function,,^ 
The forkhead domain,is,a,J00 f amino actd,mqt.f .that^ 

defines a rapidly growing ^°[ DNA b,n ^^t; 
First identified ^ ,a region u of homology be ween the , 
product of the homeotic Drosophila gene /r^ W and 
hepatocyte nuclear factor 3 < HNF3 ) from rat Weige a„ d 
Se 1990; Lai etal, 1991), the forkhead motif has , 
S been found in genes from a, number of me— 
and in Saccharvmycfis. Some of ^s^es^ave been •,. 
isolated. based on thetrhomology,to forkhead^xWNFJ 
• and ( little is-known about- their function, WjgjJP " 
includes five >r*W-related genes from Dmsoplulfl 

(FD1-FD5) Hacker et al. 1992) nine^ rom rat 

(HFHI-HFH7 and HFH-B2 and HFH-B3; Clevidence 
etal 1993).sixfrommouse(A/i/-^6)(Kaestner, f fl.. 
1993) and one from Saccharomyces (HCMI: Bork et at.. 
" ■ * 1992), which was also independently isolated as suppressor 
of a calmodulin mutation (Zhu et al. 1993). _ . 

Other members of the forkhead family have been 
identified as gettes involved in pattern 
embryogenesis. Members of th.s group include 
(Wei|el etal, 1989) and sloppy paired (slpl. and 
::• Grossriiklaiis al, 1992; Hacker et al 1992 Iftom 
Drosophila. lin-31 from Caenorhabdms ff™^ 
et al 1993): and" 'Axial from zebrafish (Strahle et al. 
'/' IQ93) Indirect evidence suggests ; a similar function tor a 
number of other forkhead genes; The erhbryo^«xpress.on : 
ITem in mice implies that HNF3a, HNF3^ wo 
related genes,m/7 and mf2: are involved^the format^ 
of the body axis arid the establishment of the germ layers 
during gastrulation (Sasaki and Hogan, 1993 )^top,c 
expression of fflVttp* in transgenic mouse embryo idenn- 
fied this gene as a regulator of floor plate development 

© Oxford University Press 



. ■ a u« D .,n 19941 HNF3Q is believed to induce 
(Sasaki and Hogan, IW). """H notochord, floor 

through activation of HNFJP^heiar ^ 
rt «/.. 1993; Sasak. and Hogan ivw 
«miifll -mrt twnnnra distribution of their e..p.e ... 

identified as encoding factors tha like WF^bm p 

expressed in terminally d,tte 7"^ LP Li et aL 1992b) 
•leukaemia vims enhancer ^<£"£$ ^ 199"2S) 
and interieukin binding factor (ILF.L1 1 vvi. 

belong to this group; : . bound to DNA has 

The forkhead domain of HNF3Y ™" n °J£l\ n - k 

^o^S D^aliJ-Wdiiional backbone conucls arc 

• „nKb (Enerback «/.. 1992). Regions in the pr° mot " 
. SrS^r in differcWanon, and. 

• : :?S|ftSnS 

fM Heiiavist, L.Samuelsson, S.Pierrou, a.nnciua. 
helklSx 3 and another region at the C-term.nal part 



domain of FREAC proteins to their «WJ »JA ^ 
rSts in bending of the DNA at an angle of 80-90 . 

Results . 

Cloning of human forkhead genes _ 

5£m* objective of identifying new *f ^» " 

of the proteins. The same relationship exists between 
FREAC-4 and FREAC 5. 



Sfof exoression are shown by/^-/ and/«?ocr2 
patterns expression " . ih placenta and 

wmm 

than the domina^ 3^n^^^ s _.^ 
r ™ TOM cells we Alined the express,«of Jm*M 
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C tmwmi «««^ ™S S 3SS uowwh « 

. FBEAC-3 PKnWKWfS OTgB ™» £S 0 £^»fU> uww" » 

IMF 1 PKDLVKFP18 «ALtnMD NAPEXKITUt ^ G ^^riiHBfi UECT«*VlS DD . 
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.BJOSFLHiS KRPOSJK* 
"IBJOSFLRHB KBfWTT 
_ _ DNGSFIiWH KRFKBQP , 

'S«U> SE.SMW.. .OBSFUUU. ffiFW-- 

_ KBVKPAP 

„ ISwOTLD PA . AACKF . - .OfflSFLRRH KBFKB,. 
GR OTBSWTLD ER.O«r,.. s 

m «S«BSWItD PR.CLDMF- • — 

^YWTLD PDSTOKF EHQSFUTCB FBFraTO 
■-- nSVNMF.. .EJiGSFLRRR .RRFKKKD . 

-■■ '.acsnjuw rffw. .■ - 



aa ^.--.- .BQGNYBPSH T:"v - : , 
OTEHSW^ TC.ACTKFFK. GEMSGYEFVK DSflDI.: 

•CSV -wyj-rBWIS >N*OSLRKIK SKVMS-, 
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the W ''genes. /rwi-6. is the most ..tissue-specific .with 
exp^on being detected only .n k.dney The relat.ve 
feveTs of mRNA between adult and fetal k.dney are 
reversed compared w.th/mic-4; m the case : of freac-6 the 
hist tevel is produced m adatt tissue -No expression 
cSdbe detected ,n any of the tissues investigated with 
a probe denved trom frecu 7 Hybnd.zat.cn w.th a P_ 
actin probe venfied the presence of intact mRNA .n each 
-lane of the blot. 

Selection of binding sites • • .. . 

?he vanat.on ,n pnmary structure within the forkhead 
domains of ,the FREAC proteins suggested than Jey may 
have Afferent DNA b.ndmg specificities To address this 
we expressed the forkhead domains of four orthef REAC 
protems ,n Eschendua tot, and selected hig^Anrty 
binding sues with each one of them from a pool of, Random- 
sequence ol.gonucleot.des .Since the pa.rs FREAC- 1 
FREAC-2 and FREAC-4/FREAC-5 are close to identical 
■ within their forkhead doma.ns. we chose to determine the 
b.nd.ng spec.hc.ty ot one representative trorr .each .pair 
(FREAC-2 and FREAC-4). .n addmon to FREAC-3 and 
FREAC-7. Recombinant FREAC proteins were expressed 
as fusions with glutathione S-transferase (GST), ^and the 
ability of GST to bind avidly to glutathione -Sepharose 
was used as a way of immobilizing the prote.n-DNA 
complexes. The oligonucleotides usedfor select.on earned 
consent flanking sequences for PCR amplification and 
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FiR. 2. Northern blo.s with human polyadenyla.ed RN A analysed *Hh 
j£ .necilic for fir«c -/ to -f> and p-aelin. No expression could he 
[^ .nves^ed ««h a probe demed from 7 



Human fwMmd protwns b«Ml ONA 



A 

Protein: 
Cycles: 



FREAC-3/GST 

0 1 2 3 4 5 



Complex 



HUhM 





fl, , Se.<v„on of b.nd.ne s..es from random Mttuenee "'■?«" u ^l' d ^ l h s >CR produux obu.ncd at.er each round o. ampl.h.auon 



cloning, while the central 32 nucleotides were randomized 
The ennchment of FREAC binding sites was fo lowed b, 
a gelshitt assay., and alter rive or six rounds^ot ^election 
and amplification a mixtu* of h,gh-afnn.ty 
was obtamed (Figure 3A) For each FREAC protein. 
?30 different sequences were compared and a consensus 
sequence calculated! The outcome is exemphhed inFigure 
3C by sequences selected with FREAC-4-GST. That the 
correct sequence had been identified ™ * 
DNase 1 footprinting. as shown for FREAC -3 Obi 



,n Fi-ure 3B Correlation between agreement with the 
consensus sequent and h.gh alhn.ty was conhrmed with 
aelshiU anr q uant,tat.ve hlter binding assays (data not 
;hown To rSle out that the GST moiety of the fusion 
pro em has any afhn.ty for the selected sequences GST 
I without anything fused to ,t was ako tested m 
.Khifl« with selected sequences: in no case was any 
met o :t:^GSl and DNA obsened In addmon 
o sequences that contain a single binding sue w.thm the 
32 randomized nucleotides, we also tound a second class 
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of selected sequences. These sequences contain sites that 
deviate more from the consensus and i 



_ instead have two, 

^ur "on procedure was sufficiently smnge* to retain 
onlv to ol^nucleotides that bound avidly to the 
S^AC-GSTLion protein immobilized on Sepharos^ 
KS «£ length ofthe ^^^^t 
more than one site to contribute to binding. Hence^ 
XSLmrf the selected sequences produced the required 

Sree sites. To ensure that weak sites did not contribute, 
o^mSvalent sequences were included m the cafcula- 
tiori of the final consensus summarized m Figure jp, 

seStioTwith the four different FREAC proteins, reveals , 
STdiey share a requirement for the core sequence 
RTAAAYA The positions within the binding .site will be 
^Xrelativet the first position of the core, ue. the 
R oStion (Figure 3D). DNase I footpnntmg (Figure 3B) 

DNA ismotected from five nucleotides 5' (posrtion -S) 

cleavaee indicating DNase I hypersensitivity induced by 
S tfTe FREAC protein, is seen on *e up^rstrand 
at the Y position (+6) and on the opposite sttand at 
position + Tg is the preferred nucteo^e at |e R 
oosition (+ 1 ) for all the proteins except FREAC-7- wmci 
preferentially selected sites with an A 
the Y position (+6), all the, proteins except FREAC 3 
oreferred a C over a T. The consensus site for binding of 
HNF3contains the RTAAAYA ^SJ^^^p^xa"^!^ 
tide difference (position +3): CTAAGTCAATA (Costa 
et al 1989). Examination of the structure of HNF3Y 
bound' toDNA (Clark e, al, 1993) shows that the core 
Snceconsists ofthe nucleotides where the.recogniuon 
hehx, helix 3, makes major |^ v ^ ont ^^^S:; 
Judged from the selected binding sites, a close agreement 
with the consensus within the cpreiequence positions is 

No position within the core deviates from ^^ A ™ 
in >9% of the selected sequences for any of the FRE AL 
proteins. in particular, the Adoubletat P°f™\+*'+l 
: appears to be critical since ,10 exception from ^A/^was 
observed. Outside the core. 3' as welj as 5 . the FREAC 
proteins exhibit different preterences^nd vanatwn^.n 
these flanking sequences appear to be better tplerated^an 
withinthe core. To investigate ^ ^' m ^ ce 2*2 
> sequences on specificity, we synthesized phgonucleptides 
Sh different combinations of 5'^ and • core ^sej^nces 
and tested their binding in a.gelshift assayed .FREAC 
proteins synthesized by in vitrp trans ation. Figure 4B and 
C illustrates . the influence on binding by FREAC-3 of 
differences in the nucleotide? flanking the core ^frpbes A 
and B share the same 5' «^*<^™ c ™*^" 
differ in their 3' flanking sequences; probe B.confornw to 
• - the FREAC-3 consensus. AACA. while me cor^spo^ing 
positions in probe A have the sequence GCAT. AW 
dieted from the nucleotide frequences of the selected 
sites FREAC-3 binds better to probe B than to probe A. 
Probes B and E are identical within the core and both 
h^he Optima. 3' sequence for FREAC-3 bmding: 



AACA They differ, however, in the 5' flanking sequence 
where probe B has the sequence CTTAA and probe E 
AGGCC FREAC-3 binds probe E with much lower 
affinity than probe B. This result shows that although the 
nucleotide frequencies in the positions 5 of the core 
imolv that any nucleotide could occupy any position, 
certain nucleotide combinations will severely impede 
binding of FREAC-3. Probe F, which combines the 5 
senuence of probe E with the 3' sequence of probe A, 
^completely to bind FREAC-3: in spite of itsconsensus 
core sequence, which confirms me impwtance of nucleo- 
tides^ either side of the core for high-iffinity binding, 



Domain swaps . . . __ C) .^ , 

When. we compared the relative °* ^ A £f 

and FREAC-4 for probes A and B we found that FREAC -4 
T^ersed preference compared with FREAC-3 and 
binds better to probe A (Figure 4D). To determme which 
subdomains within the forkhead motif mediate recognmpn 
of different parts of the binding site, we expressed ctanenc 
proteins that consist of various combinations beween 
FREAC-3 and PREACH (Figure 4A and D). These 
proteins; referred to as SWAP- 1 to SWAP-8. were Bans- 
Led in «rr« and assayed for bjndmg tpjour Afferent 
probes. Probes A and B have r^n described above and 
differ in the four nucleotides immediately 3^ of the core 
( +8 tP + 1 1 )- Probes C and D differ only in the Y position 
ofthe core (position +6): C in probe C and Tin .probe 
D I. A s discS above. FREAC-3 binds P"*e ^ 
higher affinity than probe A. while t ^^ v ^ f ?^c^ v ^p°4 
FREAC^ Of the chimeric proteins, SWAP-1 to bWAJ^i 
LTvetiksame preference JfREAC-3. while SWAP-5 to 
SWAP-8 behave like FREAC-4 (Figure 4D). These results 
suggest that amino acids close to the C 'erminus of Ae 
forkhead domain determine the 

with regard to nucleotides 3' of the core In HNF3T the 
c^Sng region comprises ; the C-term,na haK of 
wing 2 (Figures 4A and 6) which is dominated by ,a 
3h-of &sic amino acids. Within this stretch, three 
Sdues^iffer between FREAC-3 and ^^^^ tW ?"[ 
wtehare^^ 

K -RQ Jn FREA04 (Figures l and ^A). Jhes^ res«Jue> 

define the C-terminal border of the forkhead homology 

and beyond this point the amino acid sequences ol 

FREAC-3 and FREAC^ diverge completely. The proteins 

u^rn' binding experiments extend into the; unique 

^en^nlhlc-^ 

Wfiveand 16residuesres^^ 

amino acids in this regipnjnfluence binding 

Probes C and D were bound equally well by FREAC-3. 

S»,: -6,-7 and -8,whi.eFREAC-4and ^re^nrng 

SWAP proteins bound better to probe C (Figure ^4DV 

Preference fpr C at the Y position in the core therefore 

appears to be encoded by a regipn in the central part of 

the forkhead dpmain. The only; differences ,« pnmary 

^rucwWlietween ' FREAC-3.\ aM ''FREAC^' withm^Ae ' 

^segrtemPCcWinablo^ 

F%NKQG^ FREAC-3 versus Y.-EKFP/Vin FREAC^- 

(Figures I and 4A). On the supposition that the basic 

structure is the same for all forkhead proteins ^ compan«>n 

with the 3-D structure of HNF3y shows mat to ^ eight 

amino acids where the differences occur are located .n 
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W2 



SWAP 256 



SWAP 3&7 SWAP 448 



GATCCCTT A A6TAAACAGCATG AG ATC 
GATCCCTT AAGTAAACAAACAGAGATC 
GATCCAGACTOTAAACAAAC ATAGATC 
GATGCAGACTOTAAATAAACATAGATC 
G AT C C AG G GC 6 T A A At A A AC AG AG AT C 
GATCC4S£CCGTAAACAG£AJIGAGATC 
■■ .. Vflank core 3'fjank 



C S"-- 

Protein: frfah-3 h None — , 
Probe: E A F B EA F B 



Complex ,; 9 ■■ 





Probe M 

,Z wing, <WI and ; W2» in HNFJy (Clark Vr aL W.V.« imteatt^^ ^ ^translated in a ret.cUWcyte lysae, Indian*. 

dTTc. GeNh f. assav «i.h four of the probes in <B, and the forkhead doma.n J™AL - . ^.^ ^ (^Slmethiontne ■« the :;. 

i ^i W«A reactions: ,D> <>»^ lft S^^^ ! ^e I between ihe^oum of comple* fooned with probe A .ervus pn*e B. 



the loop between helices 2 and J and" ,n the first three 
amino acids of helix 3 (Figures 4 A and b), ^ ; 

: S>^f bindirigof FREAC proteihs^fle^DNA 
topology; we performed a circular ^mtutatton assay <Wu 
-."iffcShers.^ 1984). -Oligonucleotifles.^ ■ 
sites for:FllEAC r 3 and FREAC^ we« ,clon^ a 
vector between tWp tandem coptev of a 375 bp fragment 
Digestion with restriction enzymes generated' gels^ft 
■ probes identical in size and sequence but wtththe FREAC 
: binding ste:.*,dit^ 
: 5 shows the result of a gelshift wuh 
. . probes containing a FREAC-3 site at a vanable. d.stance 
from the end . of the probe. Thfe retarded comply 
representing FREAG-3 bound to DNA. m.grate^w.th a 
mobility that is inversely correlated to the d.stance between 
me binding site and the end of the probe, a relat.onsh.p 
characteristic of proteins that bend the.r target DNA 



,Wu and Crothcrs .1984): We repeated this assay *n 
polyacrylamide gels with acrylamide ^entatjons of 6 
& arid 10* and calculated the ratio between the fastest 
and slowest migrating species for each gel 
These ratios were then used to esttmate the^xtentofDN A 
distortion through linear interpolf on .J*™**^ 
obtained with A-tract DNA^standards (Tlwm^ «J 
Landy 1988): Independent of the gel concerftranon used 
, me angle of the DNA bend induced bv^ b,ndmg . of 
FREAC 1 3 was calculated to be between 80 and 90 - 
Snt results were oblaine^ ^ 
(A. B and E 'in Figure 4B> and G descnbeJ V^"*" 
and methddsi ahd with FREAC-4 as wellvas,FREAG : 3 r 

probe F failed to bind FREAG-3 or FREAG-4 even when 
cloned into the circular permutation vector. In contrast to 
the other sequences tested, probe F appears to haxe a 
^hT intrinsic curvature, as shown by d.fferencev .n 
mobility of the free DNA (data not shown). 
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Fig. 5. Circular p ... . ». 

optimal FREAC-3 site, probe G described in Materials and methods, 
was cloned between two tandem repeats of a 375 bp fragment and 
gelstiift probes were generated through digestion with the indicated 
s The FREAC-3 binding site is represented by the 



black rectangle. Gelshift assays were performed with FREAC-3 ^GST 
fusion protein and the mobilities of the fastest (fiamHI) and slowest 
(firoRV) complexes were used to calculate the DNA bending angle. 
Results obtained with a 6<* polyacfylamide gel are shown. 

Discussion 

We liave isolated clones for seven new members of 
the forkhead gene family from human DNA libraries. 
Sequence alignments of the seven freac genes, together 
with published forkhead genes, show Jhat the FREAC 
proteins conform to the general pattern within the family 
of highly cofiserved amin^acid mot^^ 
more variable regions (Figure I ). In the cases of FREAC- 1 / 
FREAC-2 and FREAC-4/FREAC-5, me primary structures 
within the forkhead domains are almost identical between 
each pair, while amino acid sequences on either side of 
the forkhead homology show no or, little resemblance. 
This may reflect that the proteins . are designed to bind 
the same set of sequences, but otherwise have distinct 
functions: Alternatively, the proteins may be functionally 
redundant and the activities exerted by regions outside 
the DNA binding domain may have a greater tolerance 
With regard to ammo acid substitutions; a condition that 
would explain the divergence in primary- structure seen in 
these parts of the proteins. Such apparent flexibility 
in primary structure . requirements- is often' seen in the 
mutational analysis of transcriptional activation domains: ; 
however, this explanation fails to account for the high 
degree of sequence conservation between species,- gener- 
ally found among tianscription fact0R v . •, - 

In the case of FREAC- 1 /FREAC-2, the: homology 
between their DNA binding domains parallels.a similarity 
in tissue distribution of expression. Both genes are 
expressed at fairly high levels in lung and placenta and it 
seems reasonable to assume that their target genes are the 



same. The forkhead motifs oifreac-l and freac-2 are not 
closely related to any other family member and these two 
genes appear to form their own subgroup. In contrast, the 
predicted amino acid sequence of FREAC-3 is identical 
within the forkhead domain to that of fkh-l (Kaestner 
et al, 1993) and frkhda (Sasaki and Hogan, GenBank 
accession number LI 0406). fkh-l and frkhda have both 
been cloned from mouse, arc identical throughout, even 
at the nucleotide level, and therefore appear to be derived 
from the same gene. The sequence of fkh-l outside the 
forkhead domain has not been published; for frkhda, 
however, some additional sequence is presented and a 
comparison with freac-3 shows that no similarity exists 
outside the forkhead homology. Furthermore, fkh-l is 
expressed in brain, heart, kidney and fat, while no expres- 
sion is detected in skeletal muscle, freac-3. on' the other 
hand, has its main site of expression in skeletal muscle, 
whereas expression in brain is hardly detectable. This 
suggests Hal freac-3 represented novel gene, while jffc/?/ 
and frkhda are derived from a gene Whose human homo- 
logue remains to be cloned. A similar relationship exists 
between freac A and HFH-B2 (Clevidence et al.. 1993). 
Within the forkhead domain the predicted amino acid 
sequences-are jdenticalrbut the expression pasterns, are 
distmcL^ac^ expression's specific for kidney and testis, 
while HFH-B2 is reported to be expressed exclusively in 
brain. Evaluation of how similar these two' genes are in 
regions other than the DNA binding domains awaits more: 
sequence information on HFHrB2. freqc-4^w$ HFH-B2 
belong to a larger group of genes with closely related 
sequences within their forkhead motifs. This group 
includes freac-5. HFH-6 (fkh-2). HFH-2 and FD3 (Hacker. 
etal. I992: aevidenceef al. 1993: Kaestner et al.. 1993). 

freac-6 appears to be the human homologue of HFH-3 
from rat (Clevidence et aC 1993). Not only are the amino 
acid sequences within the forkhead "dommns idenucal. but 
expression of both genes is restricted to kidney- freac?7 
is most closely related w fkh-6 from mouse (Kaestner; 
■ et al. . 1993) and. based on the limited sequence inform^ 
available, the homology appears to be confined to the 
forkhead motif. Examples of groups or pairs of genes 
encoding" proteins with identical or .very similar DNA. 
binding domains thus appear to be common in the forkhead 
family, a ph^ndnknoh that was first illustrated by the 
isolation of the three HNF3 isoforms a, P and; y' ' 
et al. 1991 ). Alternative splicing could ^ generate distinct 
proteins with identical DNA binding domains from the _ 
same gene if exoh borders coincii^;wim"the'^!^e>t; 
of the DNA binding domain. However; preliminary 
analysis of genomic clones for several freac genes (data 
not shown); ; as weli as the gene structures of HNF3a. B 
and y (Kaestner ei ai; 1994). provide evidence against 
this hypothesis and we therefore conclude; that strong 
selection pressures exist that control 
, within the forkhead domain. 

,'. We have determined the binding site specificities of 
four: FREAC proteins. A core sequence. RTAAAYA. is 
; common for binding sites selected with all four FREAC 
, proteins, but each FREAC protein has its own signature 
with regard to . the preferred nucleotides at the R and Y 
positions of the core. However, the flanking sequences 
appear to be more important in giving each protein its 
specificity, while at the same time being less well defined 
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than the core. A similar situation is seen in the family of 
homeobox proteins which share a requirement for the core 
Sequence TAAT and where specificity is conveyed by 
nucleotides outside the core (Ekker el «/., 1991. 1992, 

contained an intact core sequence) clearly demonstra ed . 
r,mp 0 rtance of the flanking sequences, in sp.te o the 
St that sites selected with FREAC-3 have all nudco^. 
represented at each of the five positions 5' of the core, 
in this par, of the binding site had a drama*: 
impact' on binding. HNF3 Y interacts with the DNm 5 of 
the core mainly through backbone; contacts (Clark el ah, > 
1993) It seems likely that the general way in wliuh 
HNF3Y binds DNA is relevant for the entire family of 
. S proteins. Interactions with the DNA backbone 
rrtav be indirectly dependent on base sequence through 
the t<U& -d helicity of DNA b* ^ 
sensitive to single nucleotide substitutions than the direct 
base contacts made in the major groove of thp core ; _ ^ 
It is likely that some of the restraints on the flanking 
sequences of the binding sites emanate . from the ^dramatic 
bending of DNA induced by binding of a FREAC protein. 
It should be emphasized though that the strong FREAC 
biS sites have no intrinsic curvature. In the ^circular 
pTrmufation assay all probes tested migrate w^ the same 

- mobility when present as free DNA. . . 

The functional importance of -DNA, bending '.s best 
understood in prokaryotes. Bacterial regulatory protens 
that bend DNA include transcriptional regulators such 
as CAP/CRP and architectural proteins exemplified by 
, Lgration host factor (IHF; Wu and ^rotters, 1984; 
/■:..Th^n»4m-«t Landy, 1988; M<» t0 ^^^^ 
1989; Zinkel and Crbthers. 1991): IHF bends DNA by 
, ,140° and facilitates interaction between proteins bound 
to distant sites by looping butane '^7^^ 
Transcriptional activation at a distance by AraC (Schleif 
1992) and chromosomal integration of phage lambda 
. : ( Moitoso de Vargas el uL 1989) are examples of processes 
dependent on IHF-mediated DNA bending. , 

- Among eukaryotic proteins the iposr severe distortion - 
61 Degenerated by the HMG domain protemsr of 

, which the best studied is lymphoid enhancer factor (LEF 

- ;Ss ,,- fl /.,199l,<Waterman ,,«/., 1991K LEF hasten 

.proposed to act asan architecturaKprotem in the assembly 
. of enhancer nucleoprotein complexes (Gtesee/ «/.. 
Grosschedi * <»/,. 1994). but iLs;rnode 
distinct from that of IHF; DNA toinding/bendmg is not 
, sufficients the activity of LEF and its c 0 " 1 ^^?' \ 
activationY domain can be successful^ grafted^ontO ;a 
heterologous noh Wing DNA binding domain (Carlssqn 
V et al 1993- Giese and Grosschedi, 1993). In contrast 
> • YYI a zinc finger protein, appears to fulfil the criteria of 
an architectural, protein. It acts: as a repressor or an 
-activator** transcription depending position^and 
orientation "of Itsrbinding site (Nate^an and Giljnan. 
. The mechanism by; which YYl' regulates; tran^nption is ■ 
through omerprbteins binding 

be functionaliy replaced by an unrelated DNA bending 
P XLdingbfbNAb^ 

has not been reported l^^^.f^^S- 
two of the FREAC proteins. FREAC-3 and FREAC-4. 




Fig. 6. The winged Helix" structure of HNR7 bound 10 DNA tosed 
on X-rav cnsu.llW.v- ■» described by Clark r, «t 11993.. Hie 
unree apices .HJ -HJl- the two wings .Wl ^d W2M£ti«e f- 

recognition helik (H3> is seen tilting into the major groove of the^ 
DNAiri the region of the binding site which ^^"d> to U«core 
seuuence The second wing (W21 contacts dU a m the^rnmor groove 
on the; 5' side br the core sequence.. Illustration adapted from ClarV 

w«/. n993i. ..• , ■ , ■ 

- •' : .< ■ 1 '"' : ' "' ] '. • . • -.p., 

interact: with four different binding' sites, andjountfthe 
angle of the DNA in complex with protein to frr between 
80 and 90° in each case: Based or. the high degree of 
conservation within the forkhead dornain' (Figure : |^e , 
predict that bending of the target site is an intrinsic 
characteristic of this class of DNA -biiMing'^ns . • - . 

A I3mer oligonucleotide cocrystal^ed with HNF3y 
has* curvature of only 13^Figure 6^1ark.faf . ^ 
This may reflect a differehce in binding <*^ er g£v 
between on the bneiiand FREAC-3,and 
other HNF3T Alternatively, the discrepancy may itrject 5 
the different experinientdl methods applied. More vwork 
Will be required to resolve the exacjature of complexes 
between forkhead proteins and DNA ,n solution, but . 
So observations indicate that the X-ray structure could 
underestirnate^ 
: oligonucleotide %s*d for crystall.zat.on ^"^^T 
, bev^dthe^ border of tte core. Secondlj ^ ^ ; 
^base pair* in ^ 

HNF3* which o>rres^^ ■ 
^core:dWirbm those in the HNF3 cbnsensas^nd froni- ., 
the binding site im the- transthyretin; promoter on whiLlv ; 
£s^«cWwa*ba^ ? 
conceivable 'ihteractibns.or DNA distortions 3 of «*e core 
WC rc impossible to detect and contorts beweca D^^ 

pro teinmthe3^ 
Is^tion of thei crystal :s» 
' wink J Wl. of HNF3Y projects beyond the 3 end ot the 
Nucleotide and would be ^ ^ c V^S in ^ 
make additional contacts'' ••with- DNA had a^3 flankmg 
sequence been present: It is easy to env.sage how the *o 
wings; Wl and W2. could provide the interactions that 
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would bend the DNA, narrowing the «**°™""* 
the recognition helix, H3, in the process. Maybe the loops 
of ^ forkhead domain, rather than being wn g of » 
perched on a straight rod' (Bren^n, 1993), are 
the arms of a brawny beast warping the DNA. , 
From gelshift assays performed with chimeric proteins 
and FREAG-4, we were able o pin- 
^o^bregions within the forkhead domain tot 
2eh^ bmdinl site preferences. The relative affirms 
for T versus C at the Y position (+6) are determined £y 
Srt Setch of amino acids ai the junction betweenAe 
TUoop W helix 3. None of the DNA-protein contacts 
identified in HNF3 Y will readily explain tb^^cormect^. 
iugh the base pair that would be mvolved « on, of 
; me twf iiscussed above that deviate trom ^ HNR cpre 
consensus. However; the N-terminus of helix 3 and the 
are in close proximity to the base pair at position 
Urwhich makes ^the conclusions from the domain 
swapping experiments seem logical. . : : • 
v X^ore^nsingismebto 
of two probes with different sequences in the flanking 
regS^of the core: From the way HNF3 7 
^binding site, the first wing, Wl appears to be in *e 
best position to interact with the 3' flanking DNA, and 
^ ^uS*ed the primary structure of this subdornamu, 
d^jhe^feren^ of each camera, . Interestingry,^ 
Sve aflinitiesof the chimeric proteins follow Jbe origin 
of the second wing, W2, and non-conserved sequences C- 

the DNA backbone in the minor groove on the 5 side pt 
Se cor^ Hence, the way W2 mfluences DNA -protein 
fractions 3' 'of.. the core is likely to.be.. indirect,^ 
^ternatively iesidues outside the forkhead domain onihe 
e-terminal side could contribute to DNA binding _ ; ; 

S the preparation of tWs article,. DNA binding 
speSes of HNF3, HFH-1 and HFH-2, were published 
: ) bfoverdier W al. (1994). A stretch of 20 ammo acids 
from me middle of helix 2 to five "^^ '"^^^ 
■ was found to contribute to differences, in specificity 
- between HFH-1 and HNF3P- This region endows the 
eight amino acids around the N-termmus of helix 3 tha 
we have identified as the determinant; of the. preference at 
: nucleotide position +6 in the binding sjte^, . -;A 
r Iri conclusion., subtle differences in .me . nucleotide 
V sequence within and flanking the core gen eratA diversity 
' with regard to binding specificities of foridiead prpteins, 
the an&ib acids that make direct base contacts are highly 
; conserved throughout the forkhead family, an evolutionary 
^rvatioh matched by the. contacted nucleotides in *he. 
core of the binding sites. Indirect manifestations of .the 
base sequence such as DNA helicity and topology, appear 
- to belmportant for the interactional xreate diver^y, 
interaction that are Hkely to mclude DNA b^ne 
coritacts- Amino acids around the N-teimn^b^r^of 
helix 3 and in wing 2 of the forkhead dorjiatn determine 
at least part of this specificity. A more thorough analysis 
of the structure bf forkhead domains that represent; distinct 
sequence specificities will be necessary to understand the 
waV this large family of tiai^r^.fa^jnai^;^ 

^ta^.fo ^ishifbendir* of, DNA is ^gene^ 
characteristic of forkhead proteins and, if so,; how this 
ability influences their function as gene regulators. 



Materials and methods 

cff^^^m for seauencing. were obtained from Boehringer 
Seim Sscent DNA^quencing was performed with reagents 
fan,, Phahnacia and oligonucleotides were synthesized on a Bjxkman 
fCo^TcDNA synthesis and cloning k.t was purchased from 
S«ne QUlCK^one'cDNA. molUple-tissue Northern btots and 
^Eibraries were acquired from Clontech. Kp-tabelled nucleotides , ; 
%^£££H£*!*ea from Amersham. O.igoldT, Dyna- . 
blli wSTpurchased from Dynal. rabbit retkulocyte lysatt ^ fron, . 
Sega, gluStie-Sepharose and poly(dlC) from Pharmacia. Sp.n-X 
from Costar and MetaPhbr agarose from FMC. 

illation and sequencing of eONA and genomic clones "^J? 

AAt^GCTftTGACGGCGCAAG were used to amplify forkhead 
n^ina&R wim^ , 
RNA as template. Conditions for the PCR were: 95 C. 1 min, 56 C,, 
Tmin^; 3 min: 30 cycles. Products- of me expected^ajvere; 
cloned arid sequenced .and a PGR product, whose, sequence showed that 
-K£ derived from a . previously unknown gene encoding ajnuame 
or^eaSSin.wa*usedto screen hurnan 6DNA and genormc hbr^es 
fo tSRrary was construcud in me vector ; 
5uedoly(A> + RNA prepared from the human monocyte cell lire 'THP- 
TsucS « fl ;.. ^gr^waspmparedfromTHP-.^lsaccorJng 
,o Chi^win " «t (1979) and polylA) selection was performed w,th 
0, K ?£?Sd freac-3 were.iso.ated from a ^human ^n 
fi^wi-dfld frear-7 from a^human genomic A. DASH library, ah- 

- 

(a- 32 P]dATP and post-hybridization washes were . carried out ai 
^Nuc^ulSences we're deterrnined' on a Pharmacia A.l-F. 
N ^nglSmeraseandei^ 

-dATP. ;■. ; ■ ■{'■ ' .• 



p^^ch eene^ unique '.probe ^ located outside the conserved region:: 

^si^^dorSn^ .^•^aaja.st'; 

oolv( A ) RNA frorri, multiple human tissues. Probes were labelled with 
lashel at full suingencr(65°C. ttix SSC Exposures ranged from 
p to I week. : r ' u ' ; ' 



DNAfragments encoding the. forkhead 

imifiM -7 we* amplified with the following POtpniM.iv3 - 
^^ATlGGG^^aaSCGGCGCGAG. 5--GG^G.MTT- 

gaIg^ctcgcacttccg <f^^^^J^^ 

rGGtAGCCGCAG. 5'-AAAAGTCGACTCCTTGAGGTGCAG^CT- 
GTGtf^5"-GG^^ 

f^^GGTCGACGGG^GGAGCAGCGGCTGCC \freacr4y. artdS - - 
AAGAATTTCCTCGGGCCGGGCCGAGACCC^ 
CGACCTCC^GGGGCCCCGGGGCCCGG [fmic-7). PCR ^s we« 
diA witli £ci»R! and Sa^cloned between ■fe«?!^» H ^- 
m ^cS-KG\quan and Dixon. 1991) and their 
seauencinB Cultures of E.coli DH5a harbounng the respective pGEX- 
^SSnids weie induced with OS mM lPTG a. an^ of, 
OMS^he heating was;turned off and the cultures were.allowed tc s , 

under vigorous shaking; _over a P" 1 ^. 
SlrwursVBSwerecoll^^^ 

resuspended in ice-cold TNT (10 mM Tns-Cl. pH 8.0. , 1. rr.M EDTA.. 
SS&CiM IViion X:,00, Appro.invUely ' « f 
Was ; 3dk the'siiip^-^ininiersed-ta ice water ,n an u tmsomc 
■ ^ Tsonicated'Tor 5 mm. The lysed ^.^^S 
SW55 ute^cnuirug* tubes and the rimm'&w n^*?*^ 

, u, be ^vered^from the V^ ^M^^^S^ 
re^bifugation in SW55: Glycerol was added to ihe cleared lysate to 
^^TT S, 2;mM and aliquou were frozen in liquid nitrogen. GST/ 
FREACproteins were affinity purified on glutathione- Sepharose ( and 
SS L gl„.amione y in 50 mM Tris-CK P H 8.0. GlyceroVand 
DTT were added and aliquots frozen as described above. 
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CGC|w«syn«hesi^a^^^^ 5 ug 

graphy. 6 ug of ih» ^^ScSSriTCrAGA) in 10 mM 
of Sall-Xbal pnmer ( 5;^5[ ^ .mMDTr heated to ?5°C and 
Tris-CI. P H 7.5. I0«M -Mf » S^J&P ^ 5 t> 
cooled slowly io 37 C ^J'3^Jtion was incubate* at 37°C for 
Klenowen^n* were added and ^ 

30 mi* Suites^ 1 conversion »» d ^"^ ^ .hedretically 

by running ».^«2£SSl? •«> H» ,X BB (2 ° mM 
12 us; was P^'P^r^^^ea,- 0.5 mM EDTA; 10% 
HEPES; pH 7.9; 50 mM KCl. 2 " DTT; 0 J mM 

bacterial extract added varied dqwuimg ^^iirid to -10 rig of fusion 
GSWRI^Cpro.e.n^ 
protein. 50(^ 

acldedthetube wasmckedgenuy tor^rn Was aspirated 

v b> ,a l min centrifugauon a 5 kr.pm. The supema ^ 
, and the pellet washed ^S^Juspental 
;th e first and last wash^ The ?^.**^J^^^«i 
in ,(X)u..xK:Rbuff«~nglaMe^ 

Xhul-toW primer <5 ^^^^ follows: % <>c. I min: 60°C. 
dNTPs. Amplification was pofowd as follows ^ ^ ^ 
30 s for 30 cycles followed by 1 f At K R was 

checked on a 

. precip.mted,r«usr«ndedjn^-rJ »" "-^ identical to the 
Spm-X: The bmdx^ i reacUon ^^^"^ ^ s ^ titu ^ for rhe ; 
^first. except that; 10 ul of ^j*'^^ Grounds of binding and 
„ 12U g of random sequence »^^-g^; Tte «m*er df 
amplification were made Ja*J2«Swy * rnore of the target 
cycles in the PCR ^tELfS £ after the. first round to 
DNA was retained on the ^ han » e *^, „cr product from 

sequenced for each FREAC protein. . . • . 

Hnearized ™* 
pHosphataseand^ 

tide kinase, or. 3 -labelled witn |ir n Durified , 20 000 cp.ro. 

Cerenkov of.probe w e mcuDMeu ug poly(dlC) and 2* 

&nodofJones««/ : (l988). ^ ; ' ■ { ■ 

fi^M^pii of SW*Ppn»te*» • . _ ^ we combined 
; To , generate chliri^^ k„ ■ 

. the mutagenesis - method of Nelson an . e 



overhangs ■ 
quamitated by ewiaon of l^ and co^^ ^ ^ 

counter or by scanning of autoradiograms ioihjwoj J ; , 7 . . 
ImageQuant software (Molecular Dynamics), < . • 



A^CAtGATeCCCTTCGCC < SWAP-, and ^ggggg 
TACGACAAPACGG ^^^J^^^GAC^A 
AGGTCATC (S^^and^ Irl^^S^C/iCTA (T7- 
(SWAP-4 and *>-^*£^^ 
Wrner).*™^^ 
<GST-36mer) ■«n^^ A A«^ '^i^^ 
TITATTITTAATTTrCTTTCAAATAC^ 

GGTGGTGGCGAC m-88mer). ^ « re ' ^^c^j iempiate 
the relevant SWAP primer on a ^^S,?^), «*Xcls 
(FREAC-3 for SWAP-5 to -8 and FREAG4 tor sw ^^£ 

fhe gel and, quantitation by liquid scintillation countmg. To express the 



in«o«he^^^ 

by digestion, with firoRI. Hum I. K smu cf» K - npfJp ^ 

„ucledtide kinase. and P™^™™^^"^ confeg 5» , ■ ■ 
according to Thompson and '^'"Jf^SirSiiW ■■ 
reacuons were performed as o»cn«u "~" r measured and 
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